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Target for Antiviral Therapy 

The present invention provides a crystallised module of a nuclear phosphoprotein and 
an assay and method for determining interactions with human papillomavirus E2 for 
5 in drug design, for use particularly but not exclusively in designing antiviral 

agents with potential use in treating warts, proliferative skin lesions and carcinoma of 
the cervix. 



background to the Invention 



Human papillomaviruses (HPVs) cause warts and proliferative lesions in skin and 
other epithelia. In a minority of HPV types ("high risk", which include HPVs 16, 18, 
31, 33, 45 and 56), further transformation of the wart lesions can produce tumours, 
most notably carcinoma of the cervix 1 . HPVs have evolved a sophisticated system of 
15 control, mediated by protein:DNA and protein:protein interactions, that involves both 
cellular and viral proteins. The 45 kDalton nuclear phosphoprotein, E2, has two 
central roles in this control. It acts as the principal virally encoded transcription 
factor and, in association with the virial El protein, it creates the molecular complex 
at the origin of the viral DNA replication 2 . 

20 

E2 has three distinct modules. The N-terminal module (E2NT) of about 200 amino 
acids is responsible for interactions with viral and host cell transcription factors. It is 
followed by a flexible, proline-rich, linker module and a C-terminal module (E2CT), 
each of about 100 amino acids 3 (Fig. la). The E2CT binds as a homodimer to DNA 
25 sites with a consensus sequence of ACCGN 4 CGGT 4 . In most HPVs a long upstream 
regulatory region (URR) precedes the viral genes and contains four spatially 
conserved E2 binding sites: three sites proximal to the transcription start site (p97 in 
HPV 16) and one approximately 500bp upstream. 



30 The dimer of E2CT serves to anchor E2 protein to its recognition sites on the DNA, 
the function of the E2NT is to bind and localise at least three cellular transcription 
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factors, Spl, TFIIB and AMF-1, to the transcription initiation complex. In addition, 
E2 interacts with another viral protein, El, which has ATPase and helicase activities. 
El itself binds to the viral origin of replication which consists of about 100 bp and is 
surrounded by the three E2-binding sites, proximal to the transcription start. The 
5 E2:E1 interaction greatly increases the rate of HPV genome replication 2,5,6 , Fig. la. 
An intact E2 is essential for the normal productive (wart) life cycle of HPV, however 
during malignant progression HPV DNA is integrated into the host cell genome, 
which usually results in disruption of the E2/E1 ORFs and loss of E2 protein, in turn 
leading to dysregulated expression of the viral oncogenes E6 and E7 7 . 

10 

Consistent with its role as a transcription regulator, E2 has been shown to direct the 
formation of loops in DNA containing E2 binding sites 8 . The loops were only 
formed with intact E2, and not with the E2CT alone. The E2 binding sites did not 
function independently and their co-operative effect was mediated by full length E2, 
15 leading the authors to suggest that there were specific interactions mediated by E2 
that bridged across the set of DNA binding sites through its N-terminal. A similar 
DNA loop structure could also be achieved with Spl, a cellular transcription factor, 
which forms a complex with distally bound E2 9 ; Spl/E2 interactions are critical for 
transcription activation in BPV 10 . 

20 

Eighty six known E2 proteins from different species and different human subtypes 11 
are highly conserved, with sequence identities typically of 35% in the N and C- 
terminal modules (Fig. lb). The crystal structure of the E2CT has been determined 
both alone and in complex with cognate DNA 12 " 14 . The module is a dimer with a 
25 barrel fold, and induces substantial bending (42-44°) of the DNA from its B-form 
double helix 14 . 

The structure of the proteolytic fragment of HPV 18 E2NT, missing 65 N-terminal 
residues, was recently reported at 2.1 A spacing 15 . This allowed some analysis of 
30 mutational effects on function, although the missing 65 amino acids contain residues 
which are essential for the transcriptional and replication activities of the protein. 

2 
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We report herein the structure of the complete E2NT determined by X-ray analysis at 
1.9 A. We have found that it is an L-shaped molecule with the residues vital for 
transcriptional and replication activities of the protein lying on opiposite sides of the 
N-terminal domain. Surprisingly, our results show that the surface, vital for 
5 transcription activation, is in fact involved in association of two E2NTs into a dimer. 
We suggest that dimerisation of E2NT plays an important and key role in induction 
of DNA loop formation, the mechanism by which distally bound transcription factors 
would be brought close to the site of transcription initiation. More importantly, our 
results raise the possibility that dimer formation serves as a molecular switch 
1 0 between early gene expression and viral genome replication during HPV infection. 

The process of rationalised drug design requires no explanation or teaching for the 
skilled person but a brief description is given here of computational design for the lay 
reader: various computational analyses are necessary to determine whether a 
15 molecule is sufficiently similar to the target moiety or structure. Such analyses may 
be carried out in current software applications, such as the Molecular Similarity 
application of QUANTA (Molecular Simulations Inc., Waltham, Mass.) version 3.3, 
and as described in the accompanying User's Guide, Volume 3 pages, 134-135. 

20 The Molecular Similarity application permits comparisons between different 
structures, different conformations of the same structure, and different parts of the 
same structure. The procedure used in Molecular Similarity to compare structures is 
divided into four steps: 1) load the structures to be compared; 2) define the atom 
equivalences in these structures; 3) perform a fitting operation; and 4) analyze the 

25 results. 

Each structure is identified by a name. One structure is identified as the target (i.e., 
the fixed structure); all remaining structures are working structures (i.e., moving 
structures). When a rigid fitting method is used, the working structure is translated 
30 and rotated to obtain an optimum fit with the target structure. The fitting operation 
uses a least squares fitting algorithm that computes the optimum translation and 
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rotation to be applied to the moving structure, such that the root mean square 
difference of the fit over the specified pairs of equivalent atom is an absolute 
minimum. This number, given in angstroms, is reported by QUANTA. 

5 One skilled in the art may use one of several methods to screen chemical entities or 
fragments for their ability to associate with a target. Again, these methods require no 
elucidation for the skilled person but are described here for the benefit of the 
unskilled reader. The screening process may begin by visual inspection of the target 
on the computer screen, generated from a machine-readable storage medium. 
10 Selected fragments or chemical entities may then be positioned in a variety of 
orientations, or docked, within that binding pocket as defined supra. Docking may be 
accomplished using software such as Quanta and Sybyl, followed by energy 
minimization and molecular dynamics with standard molecular mechanics force 
fields, such as CHARMM and AMBER. 

15 

Specialized computer programs may also assist in the process of selecting fragments 
or chemical entities. These include: 

1 . GRID (P. J. Goodford, "A Computational Procedure for Determining Energetically 
20 Favorable Binding Sites on Biologically Important Macromolecules", J. Med. Chem., 

28, pp. 849-857 (1985)). GRID is available from Oxford University, Oxford, UK. 

2. MCSS (A. Miranker et al., "Functionality Maps of Binding Sites: A Multiple Copy 
Simultaneous Search Method." Proteins: Structure, Function and Genetics, 11, pp. 

25 29-34 (1991)). MCSS is available from Molecular Simulations, Burlington, Mass. 

3. AUTODOCK (D. S. Goodsell et al., "Automated Docking of Substrates to 
Proteins by Simulated Annealing", Proteins: Structure, Function, and Genetics, 8, pp. 
195-202 (1990)). AUTODOCK is available from Scripps Research Institute, La Jolla, 

30 Calif. 



4 



PCT/GB00/03568 



WO 01/21645 



PCT/GBOO/03568 



4. DOCK (I. D. Kuntz et al., "A Geometric Approach to Macromolecule-Ligand 
Interactions", J. Mol. Biol., 161, pp. 269-288 (1982)). DOCK is available from 
University of California, San Francisco, Calif. 

5 Once suitable chemical entities or fragments have been selected, they can be 
assembled into a single compound or complex. Assembly may be preceded by visual 
inspection of the relationship of thefragments to each other on the three-dimensional 
image displayed on a computer screen in relation to the structure coordinates of 
calcineurin. This would be followed by manual model building using software such 
1 0 as Quanta or SybyL 

Useful programs to aid one of skill in the art in connecting the individual chemical 
entities or fragments include: 

1. CAVEAT (P. A. Bartlett et al, "CAVEAT: A Program to Facilitate the Structure- 
15 Derived Design of Biologically Active Molecules". In Molecular Recognition in 
Chemical and Biological Problems", Special Pub., Royal Chem. Soc, 78, pp. 182- 
196 (1989)). CAVEAT is available from the University of California, Berkeley, 
Calif. 

20 2. 3D Database systems such as MACCS-3D (MDL Infoimatipn Systems, San 
Leandro, Calif). This area is reviewed in Y. C. Martin, "3D Database Searching in 
Drug Design", J. Med. Chem., 35, pp. 2145-2154 (1992). 

3. HOOK (available from Molecular Simulations, Burlington, Mass.). 

25 

As the skilled reader will already know, instead of proceeding to build ligand for the 
target in a step-wise fashion, one fragment or chemical entity at a time as described 
above, inhibitory or other target-binding compounds may be designed as a whole or 
de novo. These methods include: 
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1. LUDI (H.-J. Bohm, "The Computer Program LUDI: A New Method for the De 
Novo Design of Enzyme Inhibitors", J. Comp. Aid. Molec. Design, 6, pp. 61-78 
(1992)). LUDI is available from Biosym Technologies, San Diego, Calif. 

5 2. LEGEND (Y. Nishibata et al., Tetrahedron, 47, p. 8985 (1991)). LEGEND is 
available from Molecular Simulations, Burlington, Mass. 

3. LeapFrog (available from Tripos Associates, St. Louis, Mo.). 

10 Other molecular modelling techniques may also be employed. See, e.g., N. C. Cohen 
et al., "Molecular Modeling Software and Methods for Medicinal Chemistry, J. Med. 
Chem., 33, pp. 883-894 (1990). See also, M. A. Navia et al., "The Use of Structural 
Information in Drug Design", Current Opinions in Structural Biology, 2, pp. 202-210 
(1992). 

15 

Once a compound has been designed or selected by the above methods, the efficiency 
with which that entity may bind to a target may be tested and optimized by 
computational evaluation. For example, an effective ligand will preferably 
demonstrate a relatively small difference in energy between its bound and free states 

20 (i.e., a small deformation energy of binding). Thus, the most efficient ligands should 
preferably be designed with a deformation energy of binding of not greater than about 
10 kcal/mole, preferably, not greater than 7 kcal/mole. Ligands may interact with the 
target in more them one conformation that is similar in overall binding energy. In 
those cases, the deformation energy of binding is taken to be the difference between 

25 the energy of the free entity and the average energy of the conformations observed 
when the inhibitor binds to the protein. 

An entity designed or selected as binding to a target may be further computationally 
optimized so that in its bound state it would preferably lack repulsive electrostatic 
30 interaction with the target enzyme. Such non-complementary (e.g., electrostatic) 
interactions include repulsive charge-charge, dipole-dipole and charge-dipole 
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interactions. Specifically, the sum of all electrostatic interactions between the 
inhibitor or other ligand and the target, when the inhibitor is bound to the target, 
preferably make a neutral or favourable contribution to the enthalpy of binding. 

5 Specific computer software is available in the art to evaluate compound deformation 
energy and electrostatic interaction. Examples of programs designed for such uses 
include: Gaussian 92, revision C [M. J. Frisch, Gaussian, Inc., Pittsburgh, Pa. 
.COPYRGT.1992]; AMBER, version 4.0 [P. A. Kollman, University of California at 
San Francisco, .COPYRGT.1994]; QUANTA/CHARMM [Molecular Simulations, 
10 Inc., Burlington, Mass. .COPYRGT.1994]; and Insight II/Discover (Biosysm 
Technologies Inc., San Diego, Calif. .COPYRGT.1994). These programs may be 
implemented, for instance, using a Silicon Graphics workstation, IRIS 4D/35 or IBM 
RISC/6000 workstation model 550. Other hardware systems and software packages 
will be known to those skilled in the art. 

15 

Once the ligand has been optimally selected or designed, as described above, 
substitutions may then be made in some of its atoms or side groups in order to 
improve or modify its binding properties. Generally, initial substitutions are 
conservative, i.e., the replacement group will have approximately the same size, 
20 shape, hydrophobicity and charge as the original group. It should, of course, be 
understood that components known in the art to alter conformation should be 
avoided. Such substituted chemical compounds may then be analyzed for efficiency 
of fit to a calcineurin-like binding pocket by the same computer methods described in 
detail, above. Again, all these facts are familiar to the skilled person. 

25 

Another approach is the computational screening of small molecule data bases for 
chemical entities or compounds that can bind in whole, or in part, to a target. In this 
screening, the quality of fit of such entities to the binding site may be judged either 
by shape complementarity or by estimated interaction energy. E. C. Meng et al., J. 
30 Comp. Chem., 13, pp. 505-524 (1992). 
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The computational analysis and design of molecules, as well as software and 
computer systems therefor are described in US Patent No 5,978,740 which is 
included herein by reference, including specifically but not by way of limitation the 
computer system diagram described with reference to and illustrated in Fig 3 thereof 
5 as well as the data storage media diagram described with reference to and illustrated 
in Fig 4s and 5 thereof. 

Statement of the Invention 

10 According to a first aspect of the invention there is provided a crystallised molecular 
complex of an E2 N-terminal module (E2NT) dimer protein or homologue thereof, 
for use in rationalised drug design. We have found that the dimer comprises residues 
vital for transcriptional and replicational activities of said protein lying on opposite 
sides of an N-terminal domain, for use in rationalised drug design. 

15 

Preferably the E2NT dimer protein is substantially as depicted in any of Figures 2c 
and/or 3a-d. 

According to a second aspect of the invention there is provided an in vitro method for 
20 identifying and/or selecting a candidate therapeutic agent, the method comprising 

determining interaction of a E2 N-terminal module (E2NT) dimer in a sample by 
contacting said sample with said candidate therapeutic agent and measuring DNA 
loop formation. 

25 Preferably, the method is for use in identifying and/or selecting an antiviral candidate 
therapeutic agent. 

Preferably, the candidate therapeutic agent interferes or blocks interactions of E2NT 
so as to interfere or block viral and/or cellular transcription factors. 

30 
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According to a third aspect of the invention there is provided use of an E2NT 
dimerisation inhibitor in the preparation of a medicament for use in treating warts, 
proliferative skin lesions and/or cervical cancer. 

5 According to a fourth aspect of the invention there is provided a method of 
monitoring the efficacy of an antiviral therapy in a patient receiving a medicament for 
the treatment of warts, proliferative skin lesions and/or cervical cancer comprising 
taking a sample from said patient and measuring E2NT interactions and/or DNA loop 
formation. 

10 

Thus it will be appreciated that a patient can be monitored at the start of therapy to 
test its effectiveness. Alternatively, a patient can be monitored once a therapy has 
been established so as to monitor its efficacy with a view to altering a therapy if 
found to be unsatisfactory. 

15 

The human papillomavirus E2 protein controls the primary transcription and 
replication of the viral genome. Both activities are governed by a -200 amino acid 
N-terminal module (E2NT) which is connected to a DNA binding C-terminal module 
by a flexible linker. The crystal structure of the E2NT module from high-risk type 16 

20 human papillomavirus reveals an L-shaped molecule with two closely packed 
domains, each with a novel fold. It forms a dimer in the crystal and in solution. The 
dimer structure is important in the interactions of E2NT with viral and cellular 
transcription factors and is the key to induction of DNA loops by E2. These loops 
may serve to target distal DNA-binding transcription factors to the region proximal to 

25 the start of transcription. The structure has implications for antiviral drug design and 
cervical cancer therapy. 

The invention includes method for identifying and/or selecting a candidate 
therapeutic agent, comprising applying rationalised drug design to a crystal structure 
30 obtainable by crystallising E2NT, cryogenically freezing the crystals and generating 
the crystal structure using X-ray diffraction. The method by which the E2NT crystal 
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structure is obtainable may comprise crystallisation using hanging-drop vapour 
diflusion. The method by which E2NT crystal structure is obtainable may comprise 
X-ray diffraction using uranium acetate and gold cyanide E2NT derivatives and 
refining with data extending to 1.9 A spacing. The crystal structure may comprise 
5 the portions of amino acids Ile82, GIu90, Trp92, Lysll2, Tyrl38, Vall45, Prol06, 
Lyslll, Phel68, Trpl34, Trp33 and Leu94. The rationalised drug design may 
comprise designing drugs which interact with the dimerisation surface of E2NT. 

Further provided is a computer for producing a three-dimensional representation of a 
10 molecule or molecular complex, wherein said molecule or molecular complex 
comprises or a three-dimensional representation of a homologue of said molecule or 
molecular complex, wherein said homologue comprises a binding pocket that has a 
root mean square deviation from the backbone atoms of said amino acids of not more 
than 1.5 A , wherein said computer comprises: 

15 

(a) a machine-readable data storage medium comprising a data storage material 
encoded with machine-readable data, wherein said data comprises the structure 
coordinates of E2NT amino acids Ile82, Glu90, Trp92, Lysll2, Tyrl38, Vail 45, 
Prol06, Lysl 1 1, Phel68, Trpl34, Trp33 and Leu94 according to Table 3; 

20 

(b) a working memory for storing instructions for processing said machine-readable 
data; 

(c) a central-processing unit coupled to said working memory and to said machine- 
25 readable data storage medium for processing said machine readable data into said 

three-dimensional representation; and 

(d) a display coupled to said central-processing unit, for displaying said three- 
dimensional representation. 

30 
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In class of embodiments, the three-dimensional representation is of a molecule or 
molecular complex is defined by the set of structure coordinates according to Table 
3, or wherein said three-dimensional representation is of a homologue of said 
molecule or molecular complex, said homologue having a root mean square 
5 deviation from the backbone atoms of said amino acids of not more than 1 .5 A. 

An additional aspect of the invention resides in a computer for determining at least a 
portion of the structure coordinates corresponding to an X-ray diffraction pattern of a 
molecule or molecular complex, wherein said computer comprises: 

10 

(a) a machine-readable data storage medium comprising a data storage material 
encoded with machine-readable data, wherein said data comprises at least a portion 
of the structural coordinates according to Table 3; 

15 (b) a machine-readable data storage medium comprising a data storage material 
encoded with machine-readable data, wherein said data comprises an X-ray 
diffraction pattern of said molecule or molecular complex; 

(c) a working memory for storing instructions for processing said machine-readable 
20 data of (a) and (b); 

(d) a central-processing unit coupled to said working memory and to said machine- 
readable data storage medium of (a) and (b) for performing a Fourier transform of the 
machine readable data of (a) and for processing said machine readable data of (b) 

25 into structure coordinates; and 

(e) a display coupled to said central-processing unit for displaying said structure 
coordinates of said molecule or molecular complex. 

30 A yet further aspect of the invention relates to a crystallised molecule or molecular 
complex comprising a dimerisation surface defined by structure coordinates of E2NT 
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amino acids Ile82, Glu90, Trp92, Lysll2, Tyrl38 s Vall45, Prol06, Lyslll, Phel68, 
Trpl34, Trp33 and Leu94 according to Table 3or a homologue of said molecule or 
molecular complex, wherein said homologue comprises a binding pocket that has a 
root mean square deviation from the backbone atoms of said amino acids of not more 
5 than 1.5 A. The molecule or molecular complex may be defined by the set of 
structure coordinates according to Table 3, or a homologue thereof, wherein said 
homologue has a root mean square deviation from the backbone atoms of said amino 
acids of not more than 1 .5 A. 

10 27. A machine-readable data storage medium (e.g. a magnetic or optical storage 
medium, for example a hard disc, a floppy disc or a CD-ROM), comprising a data 
storage material encoded with machine readable data which, when using a machine 
programmed with instructions for using said data, is capable of displaying a graphical 
three-dimensional representation of a molecule or molecular complex comprising a 

15 dimerisation surface defined by structure coordinates of E2NT amino acids Ile82, 
Glu90, Trp92, Lysll2, Tyrl38,.Vall45, Prol06, Lysl 11, Phel68, Trpl34, Trp33 and 
Leu94 according to Table 3, or a homologue of said molecule or molecular complex, 
wherein said homologue comprises a binding pocket that has a root mean square 
deviation from the backbone atoms of said amino acids of not more than 1.5 A. 

20 

In the machine-readable data storage medium the molecule or molecular complex 
may be defined by the set of structure coordinates according to Table 3, or a 
homologue of said molecule or molecular complex, said homologue having a root 
mean square deviation from the backbone atoms of said amino acids of not more than 
25 1.5 A. 

The invention further provides a machine-readable data storage medium comprising a 
data storage material encoded with a first set of machine readable data which, when 
combined with a second set of machine readable data, using a machine programmed 
30 with instructions for using said first set of data and said second set of data, can 
determine at least a portion of the structure coordinates corresponding to the second 
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set of machine readable data, wherein: said first set of data comprises a Fourier 
transform of at least a portion of the structural coordinates according to Table 3; and 
said second set of data comprises an x-ray diffraction pattern of a molecule or 
molecular complex. 

5 

In another aspect, the invention resides in a method for evaluating the ability of a 
chemical entity to associate with a molecule or molecular complex according to the 
invention, comprising the steps of: 

10 a. employing computational means to perform a fitting operation between the 
chemical entity and a dimerisation surface of the molecule or molecular complex; 
and 

b. analysing the results of said fitting operation to quantify the association 
between the chemical entity and the dimerisation surface. 

15 
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Detailed Description of the Invention 

The invention will now be described by way of example only with reference to the 
following Figures and Tables wherein: 

5 

Table 1 illustrates X-ray data and phasing statistics; 

Table 2 illustrates refinement and model correlation; 

10 Table 3 shows the structure coordinates of the E2NT module; 

Figure la represents functional assignments of HPV 16 E2 protein; 

Figure lb illustrates sequence alignment of E2NT modules from a subset of HPV 
15 types; 

Figure 2a illustrates a stereo view of electron density with a final model at the dimer 
interface of the E2NT module, viewed down the crystallographic two-fold axis; 

20 Figure 2b represents a stereo ribbon diagram of the E2NT module; 

Figure 2c represents the E2NT dimer; 

Figure 3a illustrates a schematic view of URR; 

Figure 3b illustrates a schematic view of loop formation induced by binding of E2 
proteins to two cognate sites; 

Figure 3c illustrates a model of E2 dimer formation; 

30 

Figure 3d illustrates loops within URR as shown in Figure 3b; 

14 
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Figure 4a illustrates the distribution of conserved residues on the E2NT monomer; 
Figure 4b illustrates a first cluster of conserved residues on the E2NT monomer; 

5 

Figure 4c illustrates a second cluster of conserved residues on the E2NT monomer; 
and 

Figure 4d illustrates conserved residues Glnl2 and Glu39. 

10 

Those of skill in the art understand that a set of structure coordinates for an enzyme 
or an enzyme-complex or a portion thereof, is a relative set of points that define a 
shape in three dimensions. Thus, it is possible that an entirely different set of 
coordinates could define a similar or identical shape. Moreover, slight variations 
15 caused by acceptable errors in the individual coordinates will have little, if any effect 
on overall shape. In terms of binding pockets, these acceptable variations would not 
be expected to alter the nature of ligands that could associate with those pockets. 

The term "associating with" refers to a condition of proximity between a chemical 
20 entity or compound, or portions thereof, and a calcineurin molecule or portions 
thereof. The association may be non-covalent-wherein the juxtaposition is 
energetically favored by hydrogen bonding or van der Waals or electrostatic 
interactions— or it may be covalent. 

25 The invention is also described with reference to US Patent No 5,978,740 which is 
included herein by reference, including specifically but not by way of limitation the 
computer system diagram described with reference to and illustrated in Fig 3 thereof 
as well as the data storage media diagram described with reference to and illustrated * 
in Fig s 4 and 5 thereof. 

30 
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With reference to Figure la and functional assignments of E2. There is shown in a 
schematic view of NT, linker and CT modules of E2 indicating known functions of 
each module. Amino acid numbers which delimit the modules correspond to E2 
from HPV16. In Figure lb, there is shown the sequence alignment of the E2NT 
5 modules from a subset of HPV types (HPV16, HPV18, HPV1 1 and HPV2a) and one 
BPV type. Shaded blocks above the alignment indicate the experimentally 
determined secondary structure. Shaded blocks below the sequences indicate the 
minimal peptide sequences involved in proteimprotein interactions, suggested by 
mutation studies. Residues with more than 90% identity among 86 PV types are 
10 coloured: red for internal structural residues, green for residues within the fulcrum 
region, blue for surface residues. 

With reference to the structural features of E2, in Figure 2a there is shown a stereo 
view of the electron density with the final model, at the dimer interface of the E2NT 

15 module, viewed down the crystallographic two-fold axis. The likelihood weighted 
map is contoured at the 1.5 a level. Ribbons of two independent monomers are 
coloured blue and yellow. Side chains of ARG37 and Ile73 which are known to be 
. critical for transactivation 4,31 , are shown in dark green; side chain of other residues 
at the dimer interface are shown in light green. Oxygen atoms are in red, nitrogen in 

20 blue, water molecules are shown as orange spheres and hydrogen bonds as dashed 
sticks. In Figure 2b, there is shown a stereo ribbon diagram of the E2NT module. 
The Nl domain is shown in aquamarine and the N2 domain in pink, with the fulcrum 
in green. In Figure 2c, there is shown the dimer of E2NT, showing the extent of the 
interface between the two subunits. The view is as in Figure 2a but rotated clockwise 

25 by 90°. Side chains of Glnl2 and Glu39 which are critical for interactions with El 31 * 
33,37 are shown in magenta. Side chains of residues at the dimer interface are 
coloured as per Figure 2a. 

With reference to Figures 3a-d there is shown loop formation in the URR of HPV16. 
30 In Figure 3 a, there is shown a schematic view of the URR. The four E2-binding sites 
are represented by boxes. Numbers in italics indicate distances between individual 

16 
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sites upstream of the p97 promoter. Two possible E2 configurations, with separate or 
dimeric E2NT modules are shown. In Figure 3 b, there is shown a schematic view of 
loop formation induced by binding of E2 proteins to two cognate sites, based on the 
experiments reported by Knight et a/ 8 . In Figure 3d, there is shown the possible DNA 
5 loops within, the URR as depicted in Figure 3b. In Figure 3c, there is shown a model 
of the formation of E2 dimers, showing interactions between both the C-terminal and 
E2NT modules. The C-terminal dimer, with its bound DNA, is based on the crystal 
structure of this module 12 . The E2NT dimer is proposed from the present work. The 
relative orientation and position of the E2NT and C-terminal modules is purely 
10 schematic. 

With reference to Figures 4a-d there are shown functionally important residues. In 
Figure 4a, there is shown the distribution of conserved residues on the E2NT 
monomer. In Figures 4b and 4c there is shown the two clusters of conserved residues 
15 in the fulcrum of E2NT. In Figure 4d, there are shown conserved residues Gin 12 and 
Glu39. Bonds in ball-and stick models are coloured aquamarine (Nl domain), pink 
(N2 domain) and green (fulcrum). Hydrogen bonds are shown as dashed lines, water 
molecules as orange spheres, oxygen atoms are in red, nitrogen atoms in blue and 
sulphur atoms in yellow. 

20 

There is convincing evidence that the E2 protein has an extended structure, is flexible 
and that its functions depend on this property. This is probably the reason why the 
intact protein has not yet been crystallised in spite of intensive efforts. A major 
problem is the extended flexible linker module, with around 100 residues. E2NT 

25 proved difficult to crystallise, and a number of different constructs were made and 
overexpressed before crystallisation with residues 1 to 201 was achieved, but even 
this construct possessed limited stability. The protein had to be crystallised within 2- 
3 days of purification; crystals grew within about 48 hours but only retained useful 
diffraction quality for a further 2-3 days. This necessitated that crystals be rapidly 

30 vitrified in cryoprotectant buffer and stored for use as soon as detector time became 
available 16 . 
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Crystals of E2NT belong to the space group P3]21 with unit-cell dimensions 
a=b=54.3 A, c=155.5A. The structure was determined using two heavy atom 
derivatives and refined with data extending to 1.9 A spacing (Fig. 2a). The main 
5 chain is well defined throughout with the exception of residues 125 and 126 which 
are in an exposed loop and are mobile. There was density for the last residue of the 
His-tag at the N-terminus, but none for the remainder of this entity. All amino acids 
lie in the allowed regions of the Ramachandran (<|>,\j/) plot 17 with 92.4% in most 
favoured regions 18 . 

10 

The transactivation module is composed of two domains, Nl and N2, arranged so as 
to give it an overall L-shaped appearance. Analysis of the PDB ,9 using DALI 20 shows 
that both have unique organisation of their secondary structures. Domain Nl, which 
forms the N-terminus of the intact E2, is composed of residues 1 to 92, which fold 

1 5 into three long a-helices, Figure 2 (b,c). There is a tight loop between ocl and oc2 and 
a more extended one between a2 and a3. The three helices pack antiparallel to one 
another in the form of a twisted plane, with angles of about 20° and 25° between the 
pairs of consecutive helices. DALI indicated a maximum Z-score of 5.7, that could 
suggest a significant correlation, for colicin la, a membrane protein which contains 

20 three 80 A long a-helices arranged more or less coplanar 21 . This is the only other 
known protein that contains a true domain made up of such a packing of three 
helices. In addition there were 42 other structures which gave Z-scores above 4.0, 
most of which were four helix bundles, such as bacterioferritin 22 . However, in these 
only two of the three Nl helices superimposed simultaneously on two, not always 

25 adjacent, bundle helices as a result of a more planar arrangement of helices within 
Nl. The indications are that the similarities observed reflect the optimum stacking 
angle of antiparallel helices against one another rather than suggesting a common 
ancestor for the evolution of these molecules. 

30 Domain N2 is made up of residues 110 to 201 and is composed almost entirely of 
antiparallel p structure, with only one short helical segment from residues 171 to 178, 
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Figure 2 (b,c). The secondary structure has two short three and four stranded 
antiparallel p pleated sheets interconnected by two stranded P ribbons. For this 
domain DALI failed to identify any significant homologies to known structures, with 
a highest Z-score of only 2.1. From the analysis of Harris and Botchan 15 and the 
5 present study, the N2 fold appears to be novel. 

The structure between the Nl and N2 domains (residues 93 to 109) contains two 
consecutive single turns of helical structure, resulting in a compact and tight turn. It 
packs closely against elements of both domains and is not a truly independent 
10 structural domain. Rather it forms a fulcrum in the L-shape formed by Nl and N2 
where it could act as a hinge, allowing the two domains to change their relative 
conformation in a specific way. Several of the interactions between adjacent regions 
of chain in the fulcrum are mediated indirectly through H-bonds involving water 
molecules, suggesting the possibility of flexibility. 

15 

One of the most striking features of the crystal structure is the association of two 
E2NT monomers into a tight dimer. The two E2NT monomers pack around the 
crystallographic 2-fold axis, as shown in Figure 2a. The dimer interface is formed 
mostly by amino acids from helices a2 and oc3 of the Nl domain and by residues 
20 142-144 from the N2 domain. The total buried surface area between the two E2NT is 

2 2 

2026 A° , comparable to the 2444 A° buried between the two E2CT , which are 
known to form a tight dimer with a K d of 3-6 x 10* M 23,24 . 

In the E2NT dimer interface, each subunit contributes a cluster of seven equivalent 
residues, invariant or conserved in the 86 known sequences of E2 n , with many direct 

25 and water-mediated hydrogen bonds and rather few non-polar contacts, Fig. 2. 
Analysis of the dimer forming surfaces shows that all the direct hydrogen bonds 
between monomers are made through these seven amino acids. For the invariant 
Arg37, all possible side-chain hydrogen bonds are made and all are well defined, 
Figure 2. Three of them are across the dimer interface. One hydrogen bond is 

30 critical, from NH2 to the main chain carbonyl oxygen of Leu77. A second hydrogen 
bond from NH2 is to OG1 of Thr81; in five out of 86 sequences this residue is 
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glutamine, and modelling shows a hydrogen bond is possible to the NE of Arg37. 
The NH1 of Arg71 H-bonds to the OE1 of residue 80, which is Giu or Gin in all but 
six variants. At the NE of Arg37 there is an ideal H-bond to water that itself makes 
another strong H-bond across the dimer interface to the main-chain carbonyl oxygen 
5 of residue 142. The role of the invariant Ile73 is the filling of the intersubunit non- 
polar volume made up of the aliphatic parts of Arg37, Gln76 and of Leu77 - in this 
case from both monomers. The Leu77 is in a few sequences substituted by valine or 
isoJeucine and in 9 out of 86 known sequences by methionine. Inspection of the 
structure shows that Leu77 is partially exposed to the solvent and therefore different 
10 hydrophobic side chains could be easily accommodated at this site. Another 
important non-polar side chain is Ala69. Its side chain methyl packs into the surface 
of the other monomer at van de Waals distance from the main chain of residue 142. 
The only observed mutation of Ala69 is to Gly, and is easily accommodated. Gln76 
is conserved or has homologous substitutions in about 2/3 of E2 sequences; in about 
15 1/4 of the sequences there is methionine or valine at this position 11 . Although 
hydrophobic substitutions of Gln76 would disrupt the hydrogen bonding to Glu80 
across the dimer interface, and to Arg37 from the same subunit, the hydrophobic side 
chain at residue 76 could instead make a compensating hydrophobic interaction with 
the adjacent intersubunit hydrophobic pocket formed by Ile73 and Leu77. 

Modelling of the amino acid variations in the 86 known papillomavirus E2 proteins 
into the other contacts at the dimer interface shows that they generally can be 
accommodated (data not shown). The consistency of the hydrogen bonds and van de 
Waals contacts at the monomer-monomer interface in the various sequences suggests 
therefore that the E2NT dimer interactions are potentially present in all 
papillomaviruses. 

The first experimental evidence for the E2NT dimerisation in the presence of DNA 
with multiple E2-binding sites was provided by Knight et al in 1991 s . Their studies 
showed that intact E2 led to the formation of DNA loops on templates with widely 
30 separated E2 binding sites, while a truncated E2, containing the DNA-binding E2CT 
but missing the N-terminal 161 residues, did not. Such dimerisation is further 
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supported by the observed synergistic transcription activation by a complex of two 
DNA-bound E2 dimers 25 . 

To analyse the functional behaviour of the E2NT dimers further, we measured the 
5 dissociation constant by sedimentation equilibrium using analytical 
ultracentrifugation of recombinant E2NT protein containing the 201 N-terminal 
amino acids. A value of AT<i~ 8.1 ± 4 x 10" 6 M was obtained, indicating medium- 
strength association. The micromolar range of the E2NT dimer £ d is certainly 
physiologically significant, and compares well with values for other transcription 

10 factors which have relatively low dissociation constants, often with the K A values 
between 1 jxM and 20 jiM 26f27 .. In vivo, the interaction could be enhanced when the 
two E2NT modules are placed in close proximity. Indeed, E2CT forms dimers which 
bind to the multiple DNA-binding sites located within the URR of viral DNA with 
£d of proteintDNA interactions usually in the nanomolar range 28 . Consequently, the 

15 local concentration of E2NT, bound to the E2CT via the non-conserved, flexible -80 
amino-acid linker, is effectively increased. 

E2NT dimer interactions, as seen in the crystal structure, could form either between 
modules which are already part of a single E2 dimer, formed as a result of E2CT 

20 dimerisation interactions and bound to a single E2 binding site on the DNA (Fig. 3a), 
or between two preformed E2 dimers located on different E2 binding sites (Fig. 3b). 
The results of the electron microscopy suggest that the latter dimerisation does 
occur 8 . Although no direct experimental evidence exists for the former dimerisation, 
it does also seem possible due to the flexibility of the linker connecting the two 

25 modules. We propose that E2 molecules may initially keep their N-terminal modules 
within their internal dimers, but swap N-terminal modules and cross link to E2 
molecules bound to distant DNA binding sites to form active loop structures during 
transcriptional activation and / or HPV DNA replication (Figure 3d). As discussed 
below, the effects of mutations on transcriptional transactivation can be explained in 

30 terms of the dimer being an essential element in this process. 
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E2 is a regulator of both transcription and viral DNA replication and thus interacts 
with other viral and host macromolecules in the infected cell. Indication of the 
possible importance of individual residues in the function comes firstly from the 
structure, secondly from the extensive set of sequences of the papillomaviral E2's 
5 and thirdly from mutagenesis studies on the individual proteins. In the following we 
make a primary attempt to map the molecule's function onto its structure. 
The pattern of amino acid conservation for the 86 available papilloma sequences 11 
has been analysed using the GCG program suite 29 . The sequences exhibit striking 
variation, characteristic of some virus families. However, 33 of the total 201 

10 residues in the E2NT construct were totally or highly conserved. Fig. 4a illustrates 
the distribution of these 33 residues in the dimer. These were categorised into two 
sets: those with an essentially structural role and those exposed on the surface with a 
potential for intermolecular interactions. Thirteen residues (Fig. lb) are buried or 
play a purely structural role within the monomer, they are not expected to be of 

1 5 functional importance and will not be discussed here. 

A further 12 of these 33 residues stand out as having a structural role in the interface 
of the Nl and N2 domains. They form three clusters, the first making direct 
interactions between the two domains (Ile82, Glu90, Trp92, Lysll2, Tyrl38, 

20 Vall45) and two separate sets of interactions, one from N2 (Prol06, Lysl 11, Phel68, 
Trpl34) and the other from Nl (Trp33, Leu94) to the structure connecting them, 
referred to here as a fulcrum. The first two clusters are shown in Figure 4 b, c and it 
can be seen that Lyslll and Lysl 12 play key roles. Their side chains point in 
opposite directions to one another and their terminal amino groups are involved in 

25 near ideal patterns of hydrogen bonds. The flat surfaces of their extended side chains 
stack against Trpl34 and Trp92, respectively. This clustering of invariant residues at 
the interface indicates a functional importance for the relative orientation of Nl and 
N2. The fulcrum could indeed provide a flexible pivot between the two domains, but 
there is no direct evidence for this as yet. Finally, while the side chain of Glu90 is 

30 held tightly in place by two H-bonds and could have a structural role, its OE2 atom is 
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exposed on the surface and is surrounded by near invariant side-chains, which may 
thus play a part in interactions with other molecules. 



Of the remaining eight conserved residues, mutational substitutions of Glu20, 
5 GlulOO and Asp 122 30 * 33 had moderate effects on the transactivation and replication 
properties of E2, which depended on a particular viral strain, Glu20 lies on the top 
surface of Nl. Asp 122 lies far away on the distal surface of N2. GlulOO is 
completely exposed and points into the solvent at the junction of the L between the 
Nl and N2 domains. The functional role of these amino acids has yet to be clarified. 

10 

Three conserved amino acids (Arg37, Glu39 and Ile73) have been subjected to point 
mutation and the effects on the two principal functions of E2, i.e. transactivation and 
HPV DNA replication have been assessed (reviewed in 4 ,also 31 ' 34 ' 35 ). Together with 
the remaining two conserved amino acids, Gin 12 and Ala69, these residues form two 
1 5 functionally important surfaces (see below). 

Finally, a number of the mutational results (reviewed in 4 , also 31 » 34 ' 35 ) correspond to 
residues that can be assigned to structural roles. Substitution of these residues will 
lead to substantial conformational changes and a probable inability to fold correctly. 
20 This is particularly true for some of the deletion mutants involving the core of the 
molecule. Knowledge of the structure will allow a more rational choice and design 
of mutants in the future. 



The induction of DNA loops by E2NT dimerisation could be important for the 
25 construction of the active transcription bubble by targeting DNA-binding 
transcription factors, bound at distal sites, to the region proximal to the start of 
transcription (reviewed in 36 ). In support of this, residues Arg37, Ile73 and Gln76 
map onto the surface of E2NT involved in dimer formation, and mutations result in 
considerable disruption of transactivation, while having little effect on replication, 
30 4 » l5 « 31 ^ The structure also shows that Ala69 which points its side chain methyl across 
the dimer interface, is also critical for transactivation. Mutational substitutions to 
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amino acids with longer side chains should have a knock out effect on E2NT dimer 
formation and consequently on transactivation. 

The sites of association with cellular transcription factors AMF-1 (residues 74-134) 
5 and TFIIB (134-216) were previously mapped onto the E2NT module (Figure 1) 
using a series of deletion mutants as well as point mutations 34,35 . These sites were 
mutually exclusive. In the structure, residues 74-134 include the fulcrum, while 
residues 134-216 correspond to domain N2. Further biochemical and structural 
studies can now be planned to characterise these interactions in more detail. 

Replication of the viral genome is initiated by binding of another viral protein, El, to 
the origin of DNA replication 4 which is itself flanked by two E2 binding sites, Fig. 
3a. While the function of E2CT dimers is to bind specifically to the DNA sites, 
E2NT interaction with El enhances the binding of El to this region. Mutational 

15 substitutions of Glu39 generally retained transcriptional activation while DNA 
replication was substantially reduced 31 " 33,37 . In the structure, the conserved Glu39 
makes every possible hydrogen bond by its side chain carboxyl oxygens, Fig. 4d. 
One hydrogen bond is to NE2 of Glnl2, which is absolutely conserved in all known 
sequences of E2. The other three hydrogen bonds are to the water molecules which 

20 are part of an intimate net of well-defined water molecules surrounding Glu39 and 
mediating its interactions with adjacent residues. Interestingly, a number of these 
protein interactions with water molecules are conserved as they are made to the 
protein backbone, including carbonyl oxygens of Glnl2, Met36 and Lys68. While 
mutation of Glnl2 in BPV1 only slightly affected both transactivation and 

25 replication, it substantially reduced cooperative origin binding 30 ' 32 . The close 
positioning of Glnl2 and Glu39 in the three-dimensional structure further enhances 
the notion that these two resides are involved in interactions with El . The conserved 
set of interactions at Glnl2/GIu39 suggests that the main chain carbonyl oxygens of 
Gin 12 and Met36 and the conserved water molecules could be also involved in these 

30 interactions. Glnl2/Glu39 are surrounded by Leu8, Ilel5, Met36, Tyr43, Gln57 and 
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Lys68, which are unlikely to contribute into E2/E1 interactions, as these residues are 
not well conserved in E2 sequences from different papillomaviruses. 

The Glnl2/Glu39 cluster lies on a side of the Nl domain which is opposite to the 
5 side involved in transactivation (and dimerisation), Figure 2c. Notably, the spatial 
separation of the two functionally important surfaces suggests that E2NT module 
could be able to interact with El at the same time as it interacts through the 
dimerisation interface with another E2NT module. 

10 The structure reported here for the entire E2 transactivation module, has several 
implications for understanding of E2 function. It is now possible to map known 
mutations onto the E2 three-dimensional structure, and to use the knowledge of 
amino acid conservation and the effects of mutations to assign roles in folding, 
structure and function to residues. To this end, our results indicate that molecular 

15 surfaces involved in transactivation and El -binding are located at opposite sides of 
the Nl domain of E2NT, suggesting that both surfaces could be accessed 
simultaneously by other protein factors. In line with these observations, El has been 
shown to modulate transactivation by directly interacting with E2, leading to 
repression of transactivation in the presence of excess El 38 . It is not inconceivable 

20 that the docking of E2NT dimer with El is sufficient to block further association 
with other target proteins. 

The structure shows that the transactivation surface is involved in the formation of 
the E2NT dimer, which could cross-link E2 molecules bound by their E2CT modules 
25 to well-separated DNA sites. Inevitably, such dimerisation would cause DNA to 
form a loop structure, targeting distally bound transcription factors to regions close to 
the promoter. While this process has been suggested to be essential for 
transactivation 36 , the definition of interacting surfaces between E2 and other cellular 
transcription factors requires a great deal of further study. 

30 
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Our results suggest that the process of DNA loop formation could involve swapping 
of E2NT modules across E2 dimers bound at separated DNA sites (Fig. 3a-d). The 
polar components of the monomer-monomer interactions may favour such exchange. 
Domain swapping is a well-recognised phenomenon that occurs relatively frequently 
5 between two individual monomers containing domains connected by a flexible linker 
39,4 °, E2 is to our knowledge the first example where the swapping event is predicted 
to occur between dimers. 



The dimerisation surface of E2 represents a good target for designing anti-viral drugs, 

10 since it is essential for viral transcription, there is no homologous human protein and 
the residues forming the interface are highly conserved among different viral strains. 
Dynamic interactions between transcription factors play a central role in the 
regulation of transcription and replication. Dimerisation, heterodimerisation and the 
monomer-to-dimer transition may play important roles during the control of the 

15 papillomavirus life cycle. These processes themselves can be regulated through 
phosphorylation, proteolysis, interaction with small ligands or changes in their 
intracellular concentration. It has been suggested that E2 can regulate the switch 
between early gene expression and viral genome replication during HPV infection 41 . 
It is possible that dimerisation of E2NT modules plays an essential role during this 

20 process. One scenario would be to activate transcription via induction of DNA loop 
formation at early stages of the viral life cycle. At later stages, when the 
concentration of expressed E2 proteins within the cell becomes high and comparable 
with the for E2 dimer formation, free E2NT modules could compete for 
dimerisation with those involved in DNA loop formation and titrate them away, 

25 switching off transcription and stimulating replication. It is also possible that other 
protein factors could be involved in this process, including, for example, El. 

The invention therefore includes the use of E2NT crystal structure in the design of 
anti-viral drugs, since it is essentia] for viral transcription. In the rationalised 
30 computational design of drugs using the crystal structure, computational analyses are 
therefore necessary to determine whether a molecule or the E2NT-binding portion 
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thereof is sufficiently similar to the E2NT structure. Such analyses may be carried 
out in current software applications, such as the Molecular Similarity application of 
QUANTA (Molecular Simulations Inc., Waltham, Mass.) version 3.3, and as 
described in the accompanying User's Guide, Volume 3 pages. 134-135. 

5 

The Molecular Similarity application permits comparisons between different 
structures, different conformations of the same structure, and different parts of the 
same structure. The procedure used in Molecular Similarity to compare structures is 
divided into four steps: 1) load the structures to be compared; 2) define the atom 
10 equivalences in these structures; 3) perform a fitting operation; and 4) analyze the 
results. 

Each structure is identified by a name. One structure is identified as the target (i.e., 
the fixed structure); all remaining structures are working structures (i.e., moving 
1 5 structures). Atom equivalency within QUANTA is defined by user input and, for the 
purpose of this invention equivalent atoms may be defined as protein backbone atoms 
(N, C.alpha., C and O) for all conserved residues between the two structures being 
compared. We will also consider only rigid fitting operations. 

20 When a rigid fitting method is used, the working structure is translated and rotated to 
obtain an optimum fit with the target structure. The fitting operation uses a least 
squares fitting algorithm that computes the optimum translation and rotation to be 
applied to the moving structure, such that the root mean square difference of the fit 
over the specified pairs of equivalent atom is an absolute minimum. This number, 

25 given in angstroms, is reported by QUANTA. 

For the purpose of one class of embodiments this invention, any set of structure 
coordinates of a molecule or molecular complex that has a root mean square 
deviation of conserved residue backbone atoms (N, C.alpha., C, O) of less than 1.5 
30 .ANG. when superimposed-using backbone atoms-on the relevant structure 
coordinates of E2NT are considered identical. More preferably, the root mean square 
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deviation is less than 1.0 .ANG.. Most preferably, the root mean square deviation is 
less than 0.5 .ANG.. 

The term "root mean square deviation" means the square root of the arithmetic mean 
5 of the squares of the deviations from the mean. It is a way to express the deviation or 
variation from a trend or object. For purposes of this invention, the "root mean square 
deviation" defines the variation in the backbone of a protein from the backbone of 
E2NT a dimerising portion thereof, for example as defined by the structure 
coordinates of E2NT described herein. 

10 

The term "least squares" refers to a method based on the principle that the best 
estimate of a value is that in which the sum of the squares of the deviations of 
observed values is a minimum. 

IS Materials and Methods 

Purification and crystallisation. 

Details of the purification and crystallisation of E2NT have been described 
previously 16 . Briefly, the ORF encoding the N-terminal 201 residues of HPV-16 E2 
was cloned into the prokaryotic expression plasmid pET15b downstream of the 20- 

20 residue His-tag leader sequence; protein was expressed in E. co/iBL21(DE3)pLysS 
and purified using nickel affinity and anion exchange chromatography. Crystals were 
obtained by hanging drop vapour diffusion with 0.8-1.2M ammonium sulphate, 0.1M 
triethanolamine pH 8.0-8.3 and 3-5% 2-methyl-2 s 4-pentanediol. Crystals grew only 
with very fresh protein preparations and deteriorated in terms of diffraction quality in 

25 less than a week. This necessitated freezing and storage of crystals in liquid nitrogen 
immediately after growth, as discussed above. 

Structure determination. 

All data were recorded on cryogenically frozen crystals. A native crystal was frozen 
for which initial data were recorded to 3.4 A 16 . For the screening of derivatives, 
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crystal stability was even more limiting. Nine crystals were soaked in various heavy 
atom reagents immediately after growth. The crystals were screened in-house using a 
MAR research imaging plate on a Rigaku RU200 rotating anode source, by recording 
3° of data for each and analysing the fractional isomorphous difference from the 
5 native. Three derivatives showed promising differences from the native, in the range 
of 15-20% after scaling using SCALEPACK 42 and were stored in liquid nitrogen. 
The native crystal was transported to EMBL Hamburg where 1.9 A data were 
measured using synchrotron radiation from beam line XI 1 , Table 1. In addition data 
were recorded at EMBL for the three promising derivatives to about 2.7 A. Two of 

10 these derivatives proved useful in phase determination and the structure was solved 
by multiple isomorphous replacement with anomalous scattering (MIRAS) at 2.7 A. 
The two derivatives were solved independently using the CCP4 suite 43 from the 
difference Patterson synthesis and by direct methods as implemented in SHELX 44 , 
Both contained a single heavy atom site. Phases, calculated using MLPHARE, were 

15 enhanced by solvent flattening4 5 using a solvent content of 50 %. The resulting high 
quality density map was easily interpretable and the initial model was built using 
QUANTA (Molecular Simulations) for all but four residues of the construct, ignoring 
the His-tag. The model was completed with REFMAC (resolution 20-1.9 A) using a 
bulk solvent correction, to an R-factor of 23.3 % (Rprec 29.7 % - for 5 % of the data). 

20 There are 221 residues in the recombinant protein: the first twenty comprise the His- 
Tag. The final model contains all but two of the 201 residues of the real protein: 
residues 125-126 are disordered and lie in a flexible surface loop. Only one residue, 
HisO, of the His-tag has clear density and an ordered conformation. In addition there 
are 187 water molecules, which were selected using ARP 46 during the course of 

25 refinement. The main statistics of the refined model are shown in Table 2. 

Analytical ultracentrifugatioiu 

Experiments were carried out in an Optima XL-A ultracentrifuge (Beckman-Coultier, 
30 CA, USA) using scanning UV optics. During the experiments, the recombinant 
E2NT was in lOmMTrisHCl pH 8.0, 5mM DTT, 0.2 mM EDTA, 300 mM NaCl. 
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Data were obtained at rotor speeds of 12,000 and 16,000 rpm, and the time to 
equilibrium was 10-12 hours. All runs were carried out at 293K, and all radial scans 
were at a wavelength of 280 nm. Dissociation constants were obtained by nonlinear 
regression using the Beckman ultracentrifuge software. 

5 P32059WO 
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Table 1 



/ aaia ana piiaomg, «w 

Data set 


Native 


UAC 


AuCN 


Space Group 


P3l21 




P3i21 


a,b (A) 


54.6a 




54.58 


c(A) 


155.73 


155.66 


156 50 


Resolution (A) 


30-1.9 


20-2.7 


20 - 2 7 


Temperature, K 


120 


120 


190 


Wavelength (A) 


0.86 


0.86 


0 R6 

U.OU 


Unique reflections 


21751 


7873 


7937 


Completeness (%) 


98.8 (89.3) 


99.8 (96.1) 


99.7 (93.8) 


(outer shell) 








R-merge (outer shell) 


" 0.058 (0.339) 


0.073 (0.271) 


" 0.061 (0.268) 


Phasing Power: (centric / acentric) 


1.55/2.07 


0.95 / 1.40 


FOM: MIRAS 


0.59 




FOM: DM 20-2.7 A (2.7 - 1.* A) 


0.88 (0.61) 


DM: Mean phase change (20-2.7 A) 


32° 


R-factor (FreeR) 


1 0.223 (0.295) 
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Table 2 

Refinement and model correlation 

Resolution 

Number of protein atoms 

Number of solvent sites 

Number of reflections used in refinement 

Number of reflections used for Rfree calculation 1111 

R-factor * 

Rfree * 

Average atomic B-factor*, A 2 protein atoms 

water molecules 

R.m.s. deviations from ideal geometry (A). Targets in parentheses 

bond distance 
angle distance 
chiral volume 



1.9-10.0 A 
1622 
211 
20637 

0.232 

0.305 
38.0 
48.5 

0.013(0.020) 
0.026 (0.040) 
0,142 (0.200) 



*Crystallographic R-factor, R(fr ee ) = I l|F 0 | - iF c || / £ |F G | 
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Table 3 



25 



30 



35 



40 



45 



CRYST 


54. 


680 


54. 


.680 


155.' 


730 


90.00 90.00 


120.00 


P3121 




SCALE1 




0.01829 




0. 


.01056 


0. 


00000 


0,00000 








SCALE2 




0.00000 




0. 


.02112 


0. 


00000 


0.00000 








SCALE 3 




0.00000 




0. 


.00000 


0. 


00642 


0.00000 








ATOM 


1 


N 


HIS 


A 




0 


5. 


469 


-26.512 


52.262 


1. 


00 


61.92 


ATOM 


2 


CA 


HIS 


A 




0 


6. 


434 


-25.669 


51.568 


1. 


00 


61.84 


ATOM 


3 


C 


HIS 


A 




0 


6. 


263 


-25.743 


50.051 


1. 


00 


53.91 


ATOM 


4 


O 


HIS 


A 




0 


6. 


089 


-24.713 


4 9.607 


1. 


00 


69.59 


ATOM 


5 


CB 


HIS 
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